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Abstract — We  propose  a  novel  RF  signal  classification  method 
based  on  sparse  coding,  a  popular  technique  for  image  recogni¬ 
tion  in  machine  learning.  We  treat  sparse  coding  as  a  configurable 
framework  and  employ  a  convolutional  sparse  coder  that  extracts 
the  maximal  similarity  from  samples  of  an  unknown  received 
signal  against  an  overcomplete  dictionary  of  matched  filter 
templates.  Such  dictionary  can  be  either  generated  or  learned  via 
unsupervised  algorithms.  Under  this  approach,  we  can  achieve 
blind  signal  classification  with  no  prior  knowledge  about  signals 
(c.g.,  MCS,  pulse  shaping)  in  an  arbitrary  RF  channel.  Since 
modulated  RF  signals  undergo  pulse  shaping  to  aid  the  matched 
filter  detection  by  a  receiver  for  the  same  radio  protocol,  we  can 
exploit  variability  in  relative  similarity  against  the  dictionary 
atoms  as  the  key  discriminating  factor  to  build  our  classifiers. 
We  present  empirical  validation  of  the  proposed  classification 
method.  Our  results  indicate  that  we  can  separate  different 
classes  of  digitally  modulated  signals  from  blind  sampling  with 
70.3%  recall  and  24.6%  false  alarm  at  10  dB  SNR.  If  a  labeled 
dataset  were  available  for  supervised  classifier  training,  we  can 
enhance  the  classification  accuracy  to  87.8%  recall  and  14.1% 
false  alarm. 

I.  Introduction 

Cognitive  radios  have  emerged  over  the  last  decade  as  a  new 
means  to  share  radio  spectrum,  the  most  expensive  resource 
to  build  a  wireless  network.  For  commercial  applications, 
Dynamic  Spectrum  Access  (DSA)  [1]  presents  a  compelling 
model  to  improve  the  utility  of  radio  spectrum  resources. 
Much  of  contemporary  research  has  viewed  cognitive  radios 
as  the  secondary  user  of  a  licensed  channel  and  focused 
on  developing  the  mechanism  to  opportunistically  access  the 
channel  to  its  maximal  spectral  efficiency. 

While  commercial  opportunities  are  promising,  the  applica¬ 
bility  of  cognitive  radios  for  tactical  networking  seems  even 
more  adequate.  The  primary  advocate  for  tactical  cognitive 
radio  systems  is  intelligent  decision  making  that  can  enhance 
resiliency  against  a  hostile,  fiercely  competing  radio  environ¬ 
ment.  There  has  been  significant  amount  of  previous  work 
devoted  to  algorithmic  approaches  for  a  cognitive  strategy 
layer,  including  game-theoretic  frameworks  [2]— [5]  to  sequen¬ 
tial  decision  making  [6]— [8] . 

These  approaches  have  provided  a  strong  foundation  for 
cognitive  tactical  radio  systems,  yet  their  performance  highly 
depends  on  the  lower  layer  capability  such  as  sensing,  de¬ 
tection,  and  inference  of  radio  signals.  In  order  to  operate 
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the  cognitive  strategy  layer,  our  claim  is  that  we  require 
intelligent  sensing  mechanisms  enabled  by  learning.  In  this 
paper,  we  focus  on  the  development  of  such  mechanisms. 
Particularly,  we  use  sparse  coding  [9],  a  feature  learning 
technique  widely  used  in  machine  learning,  to  perform  blind 
and  semi- supervised  signal  classification  for  cognitive  radios. 

Our  methods  are  new  and  unconventional  to  the  field  of 
signal  detection  and  estimation.  Our  methods  can  learn  over 
time  after  bootstrapping  with  no  prior  knowledge  about  RF 
signals  of  interest  and  achieve  a  72.6%  recall  for  blind  signal 
classification  under  a  reasonably  good  SNR.  If  a  labeled 
dataset  were  available  for  semi-supervised  training,  our  clas¬ 
sifiers  would  have  achieved  a  87.8%  recall  with  14.1%  false 
alarms,  all  without  any  protocol- specific  knowledge  about 
modulation  of  radio  signals. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section 
II,  we  provide  a  comprehensive  background  on  sparse  coding. 
In  Section  III,  we  describe  a  discriminative  framework  that 
employs  sparse  coding  as  the  primary  means  to  extract  features 
from  raw  data  in  a  powerful  classification  pipeline.  Section  IV 
presents  our  RF  signal  classification  methods.  We  propose  a 
method  for  blind  signal  classification  before  presenting  a  semi- 
supervised  approach  under  the  availability  of  a  labeled  dataset. 
We  evaluate  the  proposed  classification  methods  in  Section 
V.  In  Section  VI,  we  discuss  related  work,  and  Section  VII 
concludes  the  paper. 

II.  Sparse  Coding  Background 

This  section  presents  a  background  on  sparse  coding  and 
dictionary  learning. 

A.  Sparse  Coding 

Sparse  coding  [9]  is  an  unsupervised  method  to  learn  a 
dictionary  of  overcomplete  bases  that  represent  data  efficiently. 
Each  basis  vector  in  the  dictionary  is  also  known  as  an  atom. 
The  mathematical  objective  of  sparse  coding  is  to  describe  an 
input  vector  as  a  sparse  linear  combination  of  the  basis  vectors 
from  the  dictionary. 

Fig.  1  explains  the  sparse  coding  problem.  Given  an  TV¬ 
dimensional  input  x  E  and  dictionary  D  E  MArxK, 
sparse  coding  seeks  for  a  sparse  representation  y  G  RK  that 
minimizes  the  loss  function 

J(x,L>)  =  min  I||x-L>y||2  +  A^(y),  (1) 

yeRK  2 

where  the  first  term  optimizes  the  reconstructive  error,  and  the 
second  term  is  due  to  regularization  to  control  sparsity  of  y. 
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Fig.  1.  Sparse  coding 


Sparse  coding  considers  under  two  regularization 

strategies.  First  of  all,  we  can  adopt  the  £q  penalty,  i.e.,  A  ||y  ||0, 
to  strictly  regulate  the  total  number  of  nonzero  elements  in  y. 
Although  the  ^o-norm  of  y  gives  a  precise  control,  it  is  known 
to  be  NP-hard  [10]. 

The  second  approach  of  regularization  resorts  to  convex 
relaxation  of  the  first.  Instead  of  the  computationally  hard 
minimization,  we  can  use  the  £\  penalty  term  A  ||y || x  instead. 
This  is  also  known  as  basis  pursuit  [11]  or  the  least  absolute 
shrinkage  selection  operator  (LASSO)  [12].  It  is  well  known 
that  the  i\ -minimization  yields  a  sparse  solution  equivalent  to 
its  £q  counterpart,  but  there  is  no  analytic  link  between  the 
value  of  A  for  £i-norm  and  ||y||0  [13]. 

1)  Orthogonal  Matching  Pursuit:  While  exact  determina¬ 
tion  via  the  £q -minimization  is  hard,  approximate  solutions 
for  optimizing  io-norm  are  possible.  Especially,  fast  greedy 
algorithms  are  possible  by  selecting  the  dictionary  atoms 
sequentially  from  specifically  enforcing  sparsity  requirement 
such  as  ^-sparse  y: 

y  =  argmin  ||x  -  Dyf2  s.t.  ||y||0  <  S.  (2) 

Orthogonal  Matching  Pursuit  (OMP)  [14]  selects  the  best 
dictionary  atom  by  evaluating  the  inner  product  between  the 
input  and  a  dictionary  atom  and  uses  least  squares  to  accurately 
settle  the  coefficients  inside  y  iteratively  for  each  round. 

2)  Basis  Pursuit:  The  £\  -minimization  for  the  sparse  cod¬ 
ing  problem  can  be  written  as 

y  =  argmin  Hyllj  s.t.  Dy  =  x.  (3) 

y 

One  approach  for  this  optimization  is  linear  programming  [15]. 
Eq.  (3),  however,  is  not  in  the  standard  dual  of  a  linear  program 

mincTy  s.t.  Dy  =  x,  y  >  0. 

Chen,  Donoho  &  Saunders  [11]  recommend  make  the  follow¬ 
ing  translations 

y<t=>(u,v),  cT<t=>  (1T,1T),  D&(D,-D). 
Subsequently,  solving 

min  u  +  v  s.t.  Du  —  Dw  =  x,  u,v  >  0 
gives  the  £\ -minimization  solution  via  linear  programming. 


B.  Dictionary  Learning 

How  can  we  learn  a  dictionary  for  sparse  coding?  A  dictio¬ 
nary  is  trained  by  an  unsupervised  learning  algorithm  such  as 
K-means  clustering.  A  classical  approach  [16]  examines  the 
projected  first-order  stochastic  gradient  descent  in  a  sequence 
of  updates  for  D 


Dt 


nc 


A_i-|vJ(xt,A-i) 


(4) 


where  p  is  the  gradient  step,  II c  is  the  orthogonal  projector 
on  C,  and  unlabeled  training  examples  {x/c}^=1. 

In  principal  component  analysis  (PCA),  we  learn  a  complete 
set  of  basis  vectors — i.e.,  a  square  matrix  of  eigenvectors. 
Dictionary  learning  for  sparse  coding  aims  to  learn  an  over¬ 
complete  set  of  basis  vectors  such  that  the  column  dimension 
of  D  is  larger  than  its  row  dimension.  (Recall  D  £  'RNxK , 
so  if  <  IV.)  The  advantage  of  having  an  overcomplete  bases 
is  that  we  can  better  capture  structures  and  patterns  inherent 
in  the  input  data  more  conveniently. 

K-SVD  [17]  is  a  fast  iterative  algorithm  for  PCA-like  basis 
learning.  The  inner  loop  of  K-SVD  has  two  phases.  First,  it 
performs  batch  sparse  coding  with  current  dictionary.  Using 
the  notation  X  =  [xi . . .  xt],  the  batch  sparse  coding  yield 
the  corresponding  matrix  of  sparse  codes  Y  =  [yi . . .  yr] 
such  that  X  «  DY.  In  the  next  phase,  K-SVD  updates 
each  dictionary  atom  in  D  by  rank-1  update  via  singular 
value  decomposition  of  residual  matrix  for  the  atom.  K-SVD 
also  updates  each  sparse  code  in  Y  accordingly.  The  K-SVD 
optimization  is  given  by 


min||X-DY||2F  s.t.  ||yfc||0  <  5  Vfc.  (5) 


Because  of  the  batch  sparse  coding  phase,  K-SVD  requires 
a  sparse  coder.  We  can  use  OMP,  -minimization  via  linear 
programming  or  LASSO. 


III.  Discriminative  Sparse  Coding  Framework 

In  this  section,  we  present  a  discriminative  sparse  coding 
framework  to  build  a  high-performance  classification  pipeline. 
We  explain  unsupervised  feature  learning  method  based  on 
sparse  coding  and  dictionary  training.  Given  the  learned 
feature  mapping,  we  describe  how  we  can  perform  feature 
extraction,  train  classifiers,  and  predict  a  class  label. 

A.  Unsupervised  Feature  Learning  via  Sparse  Coding 

Typically,  an  unsupervised  method  is  used  to  learn  a  feature 
representation  of  raw  data.  Since  the  feature  mapping  should 
be  generally  applicable  and  descriptive  of  all  classes  of  data, 
feature  learning  takes  in  randomly  mixed,  unlabeled  training 
examples.  Sparse  coding  and  dictionary  training  provide  an 
unsupervised  feature  learning  algorithm  that  consists  of  the 
following  steps  as  illustrated  in  Fig.  2: 

1)  Form  input  patches  x  from  measured/received  signal 
data  that  are  unlabeled  of  their  classes; 

2)  (Optionally)  apply  preprocessing  such  as  normalization 
and  whitening; 
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3)  Learn  a  feature-mapping  via  joint  sparse  coding  (com¬ 
pute  y)  and  dictionary  ( D )  training. 

In  summary,  unsupervised  feature  learning  takes  the  un¬ 
labeled  dataset  X  =  [xi . . .  xt]  of  random  input  patches 
(each  with  dimension  N),  undergoes  sparse  coding  and 
dictionary  learning,  and  yield  a  function  /ext  :  RN 
The  transformation  via  /ext  converts  the  raw  data  input  x  to 
sparse  code  y  in  the  feature  space  learned  by  sparse  coding 
and  dictionary  training.  For  classification,  we  use  the  sparse 
code  y  as  a  feature  vector  whose  K  elements  are  features  of 
the  input  x  according  to  dictionary  D. 


Unlabeled  data  stream 


Input  patches 
of  size  N 


Dictionary  learning 


(K-SVD,  K-means,  K-medoids) 


Fig.  2.  Unsupervised  learning  via  sparse  coding  and  dictionary  training 

B.  Supervised  Classifier  Training 

A  representational  feature  mapping  learned  from  the  unsu¬ 
pervised  method  plays  a  crucial  role  for  classification  tasks. 
Having  the  feature  mapping  alone,  however,  is  usually  insuf¬ 
ficient  to  classify  data.  Classifiers  take  a  feature  vector  as  the 
input,  and  they  should  be  instructed  with  the  ground  truth 
class  (i.e.,  supervision)  about  the  feature  inputted.  Therefore, 
classifier  training  is  typically  done  by  a  supervised  method 
such  as  logistic  regression  [18]  and  support  vector  machine 
(SVM)  [19].  Supervised  classifier  training  is  depicted  in  Fig.  3. 
Note  the  labeled  input  {x^,^},  where  li  designates  the  class 
label  for  an  input  x$. 


straightforwardly  for  classification,  we  could  overwhelm  the 
classifier  training.  The  dimensionality  of  feature  vectors  is 
highly  correlated  with  the  complexity  of  classifiers.  Usually,  a 
complex  classification  model  leads  to  classifier  overfit,  which 
is  the  discrepancy  in  the  classification  results  between  the 
training  and  test  datasets.  It  is  therefore  customary  to  reduce 
the  number  of  extracted  features  by  subsampling. 

Pooling,  popular  in  convolutional  neural  networks  [21], 
operates  over  multiple  (sparse)  feature  representations  and 
aggregates  to  a  higher  level  of  features  in  reduced  dimension. 
Pooling  is  by  no  means  to  discard  any  useful  information. 
An  important  property  of  the  pooled  feature  representation 
is  translation  invariance.  Max  pooling  [22]  takes  the  max¬ 
imum  value  for  the  elements  in  the  same  position  over  a 
group  of  feature  vectors.  For  example,  consider  max  pooling 
of  L  sparse  codes  {yi,y2,---,y l}  that  yields  the  pooled 
feature  vector  z  as  in  Fig.  4.  Noting  yk  =  [yk,  1  . . .  Vkj<\ 
and  z  =  [z\  ...  zk],  max  pooling  operation  is  given  by 

zj  =max(y1j,y2j,...,yL,j). 


Vi  — *  Vi,i  vi  I  - 

-  Vl,K 

Vz  — *  y2 1  Vis..  y2,3  H 

-  yM 

yt  — *ki  v,2  v.3  - 

hi 

Z  - ^  ^2  ^3  - 

...  zK 

z1  =  max(yu,  y21, yL1) 
z2  =  max(yu,  y2  2, yL  2) 


Fig.  4.  Max  pooling  of  L  sparse  codes 
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Fig.  3.  Supervised  classifier  training 

Under  the  context  of  tactical  networking  scenarios,  it  may 
be  too  optimistic  to  assume  the  availability  of  labeled  dataset 
for  supervised  classifier  training.  This  is  because  of  the  null 
a  priori  assumption  for  an  adversarial  radio  network.  Signal 
examples  of  the  adversary  could  be  hard  to  acquire  for  pre¬ 
analysis  before  a  field  operation.  However,  we  can  assume  a 
plenty  of  signal  examples  for  the  friendly  network.  With  only 
friendly  network  signal  examples,  one  can  employ  one-class 
classifier  [20]  instead. 

C.  Subsampling  Features  with  Max  Pooling 

In  Fig.  3,  feature  vectors  (i.e.,  sparse  code  y)  go  through 
one  more  processing  step  known  as  max  pooling  before 
being  inputted  to  a  classifier  under  training.  If  all  feature 
vectors  resulted  from  a  stream  of  input  vectors  were  used 


IV.  RF  Signal  Classification  with 
Convolutional  Sparse  Coder 

This  section  introduces  a  new  method  for  RF  signal  clas¬ 
sification  based  on  feature  extraction  via  sparse  coding.  We 
consider  two  case  scenarios.  In  the  first  scenario,  we  con¬ 
sider  that  there  is  no  labeled  dataset  for  supervised  classifier 
training.  Here,  we  completely  rely  on  unsupervised  learning 
by  sparse  coding  and  dictionary  training.  The  first  scenario 
can  be  considered  as  blind  source  separation  in  the  feature 
domain.  In  the  second  scenario,  our  approach  is  based  on  the 
semi- supervised  learning  framework. 

A.  Sparse  Coding  Setup 

Our  view  on  sparse  coding  is  that  it  is  a  customizable 
framework  for  feature  extraction.  The  sparse  coding  setup  in 
Fig.  1  is  a  realization  based  on  matching  or  basis  pursuits  that 
emphasize  reconstructive  representation  with  the  regularization 
on  sparsity.  For  discriminative  purposes,  an  OMP  sparse  coder 
evaluates  the  membership  of  a  given  input  x  to  each  dictionary 
atom  with  the  inner  product.  Conceptually,  this  is  equivalent 
to  the  way  that  K-means  clustering  evaluates  the  Euclidean 
distance  between  data  and  a  cluster  centroid,  or  that  the 
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Gaussian  mixture  model  computes  the  posterior  probabilities 
given  x. 

Since  our  eventual  goal  is  classification,  we  want  to  opti¬ 
mally  configure  the  sparse  coding  framework  with  the  most 
suitable  metric  that  examines  the  correlation  of  a  received 
RF  measurement  to  our  dictionary  atoms.  We  propose  a 
convolutional  sparse  coder  that  maps  an  input  vector  x  G 
(samples  of  received  signal)  to  the  feature  y  G  RK  with 
respect  to  the  matched  filter  templates  in  dictionary  D.  The 
ith  element  in  y  is  given  by 

yi  m  max  |x  *  d$| ,  (6) 

where  *  denotes  the  convolution  operator,  and  d*  the  ith 
dictionary  atom.  We  impose  the  similar  regularization  for  y, 
leaving  only  the  S  largest  values  of  y  as  are  and  setting  the 
rest  zeros. 

The  underlying  principle  behind  our  setup  is  matched  filter¬ 
ing.  Mathematically,  the  nonzero  element  in  the  convolutional 
sparse  code  y  reflects  the  maximum  correlation  between  the 
input  x  and  the  corresponding  dictionary  atom,  which  is  some 
matched  filter. 

What  are  the  matched  filter  templates  that  constitute  our 
dictionary?  Radio  protocols  employ  specific  pulse  shaping  to 
aid  effective  detection  of  known  signals  for  the  receiver.  This 
pulse  shaping  function  defines  a  matched  filter  template. 

B.  Blind  Signal  Classification 

For  blind  signal  classification,  we  perform  K-means  cluster¬ 
ing  with  sparse  codes.  This  is  essentially  blind  source  (signal) 
separation  performed  in  the  feature  domain. 

C.  Semi-supervised  Signal  Classification 

If  both  unlabeled  and  labeled  datasets  are  available,  we 
can  use  a  semi- supervised  method  for  signal  classification. 
First,  we  perform  unsupervised  feature  learning  via  sparse 
coding  and  dictionary  training.  Given  the  learned  dictionary, 
we  train  linear  1-vs-all  SVM  classifiers.  Assuming  a  multiclass 
classification  problem  with  M  classes,  each  SVM  is  trained  to 
classify  signal  class  j  against  class  k  j  Vj,  k  =  1, . . . ,  M. 

At  runtime,  we  take  sample  measurements,  perform  sparse 
coding  and  subsampling  of  sparse  codes  by  max  pooling,  and 
predict  the  signal  class  label  using  the  pooled  sparse  code. 

V.  Improvement  via  Shift-invariant  Sparse  Coding 

In  the  previous  section,  we  have  described  a  blind  signal 
classification  method  based  on  sparse  coding.  We  extend  our 
baseline  approach  with  shift-invariant  sparse  coding  (SISC) 
that  can  compensate  all  possible  shifts  of  a  blindly  sam¬ 
pled  signal.  SISC  leads  to  more  efficient  representable  basis 
functions  that  can  be  learned  from  unlabeled  received  signal 
samples.  We  argue  that  slightly  high-level  representations 
learned  through  SISC  provide  useful  features  to  discriminate 
one  RF  signal  to  another. 


A.  Preliminaries 

Clustering  time-series  data  is  hard.  If  unable  to  address 
countless  time  shifts,  it  is  well-known  that  the  clusters  ex¬ 
tracted  from  (sub)sequences  of  a  time  series  are  close  to 
random  [23].  This  makes  unsupervised  learning  of  efficient 
basis  representations  by  the  use  of  sparse  coding  fundamen¬ 
tally  flawed.  Shift-invariant  sparse  coding  (SISC)  [24],  [25]  is 
an  extension  of  sparse  coding  that  can  accommodate  possible 
time  offsets  of  time- series  data  such  as  a  radio  signal.  The 
SISC  optimization  problem  is  formulated  as  follows 

minE  l|x»  -  *yW>||i  I  A  ^  y,'Ji||, 

y  .d  —  —  1 

i= 1  3  =  1  i,j 

s.t.  ||cy2  =  l,  1  >j<K  (7) 

where  ydd)  represents  the  coefficient  corresponding  to  the 
time-series  input  xW  (/.<?.,  received  signal),  the  j th  basis  vector 
from  D. 

By  expanding  the  convoluted  summation  term  in  the  equa¬ 
tion,  we  can  treat  the  SISC  problem  as  a  massively  large  sparse 
coding  problem.  Unfortunately,  doing  so  leads  to  computa¬ 
tionally  infeasible  cases  even  for  a  moderate  problem  size. 
The  existing  algorithms  [24]-[26]  rely  on  heuristic  solutions. 
We  adopt  the  approach  by  Grosse  et  al.  [27],  which  will  be 
explained  in  detail. 

B.  Efficient  SISC  Algorithm 

In  general,  a  sparse  coding  algorithm  alternates  between  two 
convex  optimization  problems:  1)  compute  the  sparse  codes  y 
by  fixing  the  basis  vectors  d  in  dictionary  D  and  2)  update  the 
basis  vectors  by  fixing  the  sparse  codes.  Key  challenge  to  solve 
Eq.  (7)  is  that  each  basis  vector  can  appear  in  any  possible 
shift,  making  every  element  in  a  basis  vector  contributed  to 
many  different  terms  in  the  objective  function. 

Grosse  et  al.  [27]  proposes  an  adept  technique  that  trans¬ 
forms  Eq.  (7)  into  the  frequency  domain.  Such  transformation 
eliminates  the  intractable  problem  at  the  time  domain  and 
replaces  with  a  new  problem  that  is  only  mildly  more  difficult 
than  classical  sparse  coding.  Let  us  denote  the  discrete  Fourier 
transform  (DFT)  of  the  basis  vectors  D'  =  (d'l5 . . . ,  d'K).  Note 
that  each  d-  is  complex.  Parseval’s  theorem  implies  that  the 
DFT  of  d  scales  its  I2  norm  by  a  constant  factor,  say  a.  Also, 
the  Fourier  transform  for  a  convolution  of  two  vectors  is  the 
element-wise  product  of  the  Fourier  transforms  of  the  two. 
Therefore,  we  can  reduce  the  optimization  in  Eq.  (7)  to 

min^||x'«-^d'®y'W)|||  s.t.  ||d'||I  =  a  (8) 

i  3 

where  x'  and  y'  are  the  DFT  of  the  input  and  its  sparse  code. 
The  optimization  is  now  over  the  vectors  of  complex-valued 
DFTs,  and  the  Lagrangian  can  be  decomposed  as  a  sum  of 
quadratic  terms  on  each  frequency  component  k 

Ad',  7)  =  E  (K(<)  -  Y^lli  +  d'fc*rd'fc)  -  alT7  (9) 

k 
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with  dual  variables  7  G  RK .  Note  that  T  =  diag(7)  and 

The  Lagrangian  C( d',7)  can 


r  '(id) 


Y'  — 

1  k  — 


/( 1,2) 
y*; 


/(2,1)  /( 2,2) 

y/c  y/c 


be  written  as  a  function  of  only  real  variables  using  the  real 
and  imaginary  parts  of  d'.  Furthermore,  we  can  analytically 
compute  d'min  =  argmind'  £( d',7) 


<Cn  =  Yen 


■r)  1  y 


/*  / 

k  ^k 


(10) 


In  this  paper,  we  derive  the  dual  optimization  problem  and 
optimize  a  solution  over  the  dual  variables  7  via  Newton’s 
method.  When  the  optimal  7  is  computed,  the  shift-invariant 
dictionary  basis  d'  can  be  recovered  by  Eq.  (10). 


VI.  Evaluation 

We  evaluate  the  proposed  blind  and  semi- supervised  signal 
classification  methods  in  MATLAB. 


A.  Signals 

We  assume  a  mixture  of  known  and  unknown  signals  in  an 
arbitrary  radio  channel.  If  the  receiver  detects  an  energy  level 
above  certain  threshold,  it  samples  the  channel  according  to  its 
bandwidth  and  stores  the  measurement  for  further  processing, 
which  will  be  explain  in  the  next  section.  We  have  two  classes 
of  known  signals,  namely  SI  and  S2.  By  known  signals,  it 
means  that  we  have  the  knowledge  of  pulse  shaping  applied 
during  the  baseband  modulation  and  can  perform  matched 
filter  detection.  The  known  signals  are: 

•  (SI)  Single-carrier:  QPSK  with  rectangular  pulse; 

•  (S2)  OFDM:  QPSK  modulated  on-carriers  with  raised 
cosine  pulse. 

There  are  also  two  classes  of  unknown  signals,  S3  and 
S4,  that  we  have  no  knowledge  of  their  pulse  shaping.  The 
unknown  signals  have  the  following  specifications. 

•  (S3)  Single-carrier:  QPSK  with  unknown  custom  pulse 

P(t )  =  |[1  -cos(^)]; 

.  (S4)  OFDM:  BPSK,  QPSK,  16-QAM  modulated  on- 
carriers  with  p(t). 

B.  Experimental  Methodology 

1)  Generation  and  transmission  of  signals:  To  generate 
signals,  we  have  first  generated  random  data  bit  stream  bk- 
The  baseband  signals  are  generated  according  to  the  following 
digital  (I-Q)  modulation  schemes. 

•  BPSK:  dgpsK  if)  =  Yk  kkPifkTh) 

•  QPSK:  dQPSK(£)  =  Yk^2kP(tkTs)  +  Yk  h2k+ip(t  ~ 
kTs ) 

•  16-QAM:  dQAM{t)  =  Yk  hp(tkTs )  +  Yk  QkP{tkTs ) 

•  OFDM:  generated  by  comm. OFDMModulator  method  in 
MATLAB 

For  16-QAM,  are  the  in-phase  and  quadrature  ampli¬ 

tudes,  taking  values  ±1,  ±3. 

As  mentioned  earlier,  we  have  used  rectangular,  raised  co¬ 
sine,  square-root  raised  cosine,  and  custom  pulse  functions  for 


baseband  pulse  shaping  of  the  baseband  modulated  waveforms. 
The  final  carrier-modulated  signal  is  given  by 

s(t)  =  Ac[di(t)  cos(27 r/ct)  +  dQ(t )  sin(27r fct)\, 

where  fc  is  the  carrier  frequency,  and  Ac  the  carrier  amplitude 
gain.  The  in-phase  and  quadrature  components  d/(t),  dg(t) 
are  generated  according  to  one  of  the  I-Q  modulation  schemes 
above. 

We  transmit  s(t)  through  the  AWGN  channel  at  20  dB  and 
OdB  SNR.  Hence,  the  measurement  at  a  receiver  constitutes 
noisy  samples.  For  every  class,  we  generate  two  datasets. 
There  are  1,000  signal  examples  per  each  dataset.  We  use 
the  first  dataset  for  training,  and  the  other  for  evaluating 
classification  performances. 

2)  Data  processing  and  classification  pipeline:  The  data 
processing  and  classification  pipeline  is  depicted  in  Fig.  5. 
The  measured  signal  samples  are  vectorized  to  patches  of  size 
N  =  64.  Note  that  an  I-Q  modulated  signal  is  complex- valued, 
hence  the  received  samples  are  also  complex,  i.e.,  x  G  C^. 
We  can  train  the  dictionary  using  the  received  samples  via 
unsupervised  K-SVD  algorithm  (without  knowing  what  their 
classes  are).  However,  we  take  the  following  generative  ap¬ 
proach  for  dictionary  D. 

1)  D  has  K  =  100  dictionary  atoms 

2)  Each  atom  d*  has  matching  size  N  =  64  and  is 
complex-valued 

3)  D  is  divided  to  4  regions — first  20  atoms  belong  to 
the  family  of  rectangular  pulses,  second  20  atoms  to 
raised  cosine  family,  third  20  atoms  to  square-root  raised 
cosine;  the  last  40  atoms  are  randomly  generated 

We  use  a  convolutional  sparse  coder  whose  operation  is 
described  by  Eq.  (6)  given  a  patch  x  of  the  received  signal 
samples.  Sparse  code  y  has  dimension  K  and  is  real-valued 
( Rk ).  We  use  the  max  pooling  factor  M  =  4.  Note  that  the 
pooled  feature  vector  z  has  the  same  dimension  as  y  and  is 
also  real. 


X 

(received  signal) 


K  =  100  matched  filter  templates 


20  raised 
templates 


20  square 

templates 


40  random  templates 


y,  =  max||conv(fliplr(h(),x|| 


Max  pool 
by  /W  =  4 


(feature  vector)  (pooled  feature) 


Fig.  5.  Sparse  coding  setup  with  convolution  for  RF  signal  classification 

In  summary,  the  feature  transformation  xGy  Gz  takes 
place  by  sparse  coding  of  sequentially-fed  raw  input  patches 
followed  by  max  pooling.  The  pooled  feature  vector  z  is  used 
for  classification. 

3)  SVM  classifier  training:  Fig.  6  explains  1-vs-all  SVM 
training.  We  have  trained  two  linear  SVM  classifiers  using  the 
pooled  feature  vectors  z.  The  first  SVM  classifies  the  signal 
class  SI  against  all  others.  For  this,  we  prepare  labeled  datasets 
{zsi  ’  +1}J=i  and  {z^2)uS3u54,  -1}  f=1.  Similarly,  the  second 
SVM  that  classifies  S2  against  all  others  are  trained  with 
labeled  datasets  {z£2,+l }j=1  and  {z^1)uS3uS4, -1}J=  a. 
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{ZsA+U 

2VS3VS4  ^ ' — 1} 


Classifies  SI  against  others 


{zS2W +1} 


-«■=> 


« -1} 


■S1VS3VS4 

Classifies  S2  against  others 

Fig.  6.  1-vs-all  SVM  training  for  semi-supervised  approach 


4)  Evaluation  metric:  We  compute  recall  or  true  positive 
rate  (TPR)  and  false  alarm  rate  (FPR)  to  evaluate  classification 
accuracy. 


Recall  = 
False  alarm  = 


Y  True  positives 

Y  True  positives  +  Y  False  negatives 

Y  False  positives 

Y  False  positives  +  Y  True  negatives 


C.  Results  and  Discussion 

For  blind  classification,  we  try  to  see  whether  or  not  K- 
means  clustering  in  the  pooled  feature  (z)  domain  gives  natural 
separation.  Setting  the  number  of  clusters  K  =  4  for  K- 
means  (not  to  be  confused  with  the  number  of  atoms  K 
in  dictionary  D ),  we  have  been  able  to  see  the  separation 
that  we  seek  for.  By  counting  the  cluster  majority  signal 
class  and  minorities,  we  have  computed  the  recall  and  false 
alarms.  Table  I  summarizes  the  accuracy  of  the  blind  signal 
classification  via  K-means. 

The  blind  classification  performance  is  reasonably  good 
considering  that  we  do  not  use  any  prior  knowledge  about 
these  signals,  yet  we  can  achieve  up  to  a  70.3%  recall  at 
24.6%  false  alarm  at  SNR  =  lOdB.  If  a  labeled  dataset  were 
available  for  semi-supervised  training,  our  classifiers  would 
have  achieved  a  87.8%  recall  with  14.1%  false  alarms,  all 
without  any  protocol- specific  knowledge  about  modulation  of 
radio  signals.  In  Fig.  7,  we  present  the  confusion  matrix  for 
1-vs-all  SVM  classifier  trained  using  signal  sampled  received 
at  SNR  =  lOdB.  Similarly,  the  confusion  matrix  for  SMV 
classifier  trained  under  SNR  =  OdB  is  presented  in  Fig.  8. 


Fig.  7.  Confusion  matrix  for  SNR  =  20  dB  (darkest  box:  0.89,  lightest:  0.06) 


Fig.  8.  Confusion  matrix  for  SNR  =  OdB  (darkest  box:  0.73,  lightest:  0.22) 


TABLE  I 

Classification  accuracy  (values  in  parentheses  are  for  SNR  = 

OdB) 


Recall 

False  alarm 

Scenario  1 
(Blind  classification) 

0.703  (0.582) 

0.246  (0.367) 

Scenario  2 
(Semi-supervised) 

0.878  (0.726) 

0.141  (0.262) 

VII.  Related  Work 

Our  signal  classification  methods  are  inspired  by  the  way 
that  sparse  representations  of  raw  image,  audio,  and  text  data 
are  used  in  computer  vision  and  pattern  recognition.  Wright 
et  al.  [28]  have  developed  a  recognition  system  that  can 
classify  an  image  of  human  face  using  sparse  representations 
of  image  segments,  which  is  a  similar  idea  to  ours.  Pooling 
multiple  sparse  features  to  make  an  aggregate  representation  is 
widely  studied  in  computer  vision.  The  original  idea  of  spatial 
pooling  techniques  dates  back  to  Riesenhuber  and  Poggio  [29] . 
Heisele,  Ho,  and  Poggio  [30]  explain  useful  techniques  of 
applying  SVM  for  multi-class  classification  such  as  training 
a  1-vs-all  classifier  in  our  semi- supervised  approach. 

VIII.  Conclusion 

We  have  introduced  a  blind  signal  classification  method 
based  on  sparse  coding.  Our  method  is  motivated  by  an  active 
area  of  research  in  sparse  representation  learning  [31].  With 
no  prior  knowledge  or  assumptions  on  a  blindly  sampled 
signal,  we  take  advantage  of  correlating  it  to  an  overcomplete 
dictionary  of  known  matched  filter  templates,  which  can  be 
pregenerated  or  trained  by  an  unsupervised  learning  algorithm. 
This  coding  process  yields  a  discriminative  feature  that  cap¬ 
tures  the  variability  of  correlations  measured  by  convolving 
the  signal  with  respect  to  each  dictionary  atom. 

As  our  goal  is  to  build  a  discriminative  framework  for  classi¬ 
fication  tasks,  we  have  exploited  sparsity  by  regularizing  over 
the  convolutional  filter  outputs  and  leaving  only  the  largest 
several  values  as  are.  The  empirical  results  for  classification 
are  promising.  We  have  designed  a  simulated  experiment  for 
blind  classification  similar  to  blind  source  separation  and 
found  that  our  method  can  achieve  up  to  a  70.3%  recall  at 
24.6%  false  alarm  rate  at  a  reasonable  SNR  of  10  dB  without 
any  protocol- specific  knowledge  about  simulated  radio  signals. 
If  a  labeled  dataset  were  available  for  supervised  training,  our 
classifiers  would  have  achieved  a  87.8%  recall  with  14.1% 
false  alarm. 
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